Goto

Collaborating Authors

 legal expert



What legal experts say about second US strike on Venezuela boat

BBC News

Several legal experts have told BBC Verify that the second strike on an alleged Venezuelan drug boat by the US military was probably illegal, and would likely be considered an extrajudicial killing under international law. On Monday, the Trump administration confirmed that a follow-up strike on the boat - which has been criticised as a double tap - was ordered by US Navy Admiral Frank Bradley with the overall operation having been authorised by War Secretary Pete Hegseth. Nine people died in the first strike on the vessel and two survivors were left clinging to the burning wreckage when it was struck again, killing them, according to the Washington Post. A US official has said four missiles were used in the operation. The Trump administration has not denied there were survivors and has insisted the strikes on 2 September were in accordance with the law of armed conflict.





Summarisation of German Judgments in conjunction with a Class-based Evaluation

Steffes, Bianca, Wiedemann, Nils Torben, Gratz, Alexander, Hochreither, Pamela, Meyer, Jana Elina, Schilke, Katharina Luise

arXiv.org Artificial Intelligence

The automated summarisation of long legal documents can be a great aid for legal experts in their daily work. We automatically create summaries (guiding principles) of German judgments by fine-tuning a decoder-based large language model. We enrich the judgments with information about legal entities before the training. For the evaluation of the created summaries, we define a set of evaluation classes which allows us to measure their language, pertinence, completeness and correctness. Our results show that employing legal entities helps the generative model to find the relevant content, but the quality of the created summaries is not yet sufficient for a use in practice.


Aligning Language Models for Icelandic Legal Text Summarization

Harðarson, Þórir Hrafn, Loftsson, Hrafn, Ólafsson, Stefán

arXiv.org Artificial Intelligence

The integration of language models in the legal domain holds considerable promise for streamlining processes and improving efficiency in managing extensive workloads. However, the specialized terminology, nuanced language, and formal style of legal texts can present substantial challenges. This study examines whether preference-based training techniques, specifically Reinforcement Learning from Human Feedback and Direct Preference Optimization, can enhance models' performance in generating Icelandic legal summaries that align with domain-specific language standards and user preferences. We compare models fine-tuned with preference training to those using conventional supervised learning. Results indicate that preference training improves the legal accuracy of generated summaries over standard fine-tuning but does not significantly enhance the overall quality of Icelandic language usage. Discrepancies between automated metrics and human evaluations further underscore the importance of qualitative assessment in developing language models for the legal domain.


LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models

Li, Haitao, Chen, You, Ai, Qingyao, Wu, Yueyue, Zhang, Ruizhe, Liu, Yiqun

arXiv.org Artificial Intelligence

Large language models (LLMs) have made significant progress in natural language processing tasks and demonstrate considerable potential in the legal domain. However, legal applications demand high standards of accuracy, reliability, and fairness. Applying existing LLMs to legal systems without careful evaluation of their potential and limitations could pose significant risks in legal practice. To this end, we introduce a standardized comprehensive Chinese legal benchmark LexEval. This benchmark is notable in the following three aspects: (1) Ability Modeling: We propose a new taxonomy of legal cognitive abilities to organize different tasks. (2) Scale: To our knowledge, LexEval is currently the largest Chinese legal evaluation dataset, comprising 23 tasks and 14,150 questions. (3) Data: we utilize formatted existing datasets, exam datasets and newly annotated datasets by legal experts to comprehensively evaluate the various capabilities of LLMs. LexEval not only focuses on the ability of LLMs to apply fundamental legal knowledge but also dedicates efforts to examining the ethical issues involved in their application. We evaluated 38 open-source and commercial LLMs and obtained some interesting findings. The experiments and findings offer valuable insights into the challenges and potential solutions for developing Chinese legal systems and LLM evaluation pipelines. The LexEval dataset and leaderboard are publicly available at \url{https://github.com/CSHaitao/LexEval} and will be continuously updated.


Congratulations to the winners of the #AIES2024 best paper awards

AIHub

The Seventh AAAI/ACM Conference on AI, Ethics, and Society (AIES-24) was held in San Jose, California from October 21-23, 2024. During the opening session of the conference, the best paper award winners were announced. Abstract: In response to rising concerns surrounding the safety, security, and trustworthiness of Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red-teaming as a key component of their strategies for identifying and mitigating these risks. However, despite AI red-teaming's central role in policy discussions and corporate messaging, significant questions remain about what precisely it means, what role it can play in regulation, and how it relates to conventional red-teaming practices as originally conceived in the field of cybersecurity. In this work, we identify recent cases of red-teaming activities in the AI industry and conduct an extensive survey of relevant research literature to characterize the scope, structure, and criteria for AI red-teaming practices.


Rethinking Legal Judgement Prediction in a Realistic Scenario in the Era of Large Language Models

Nigam, Shubham Kumar, Deroy, Aniket, Maity, Subhankar, Bhattacharya, Arnab

arXiv.org Artificial Intelligence

This study investigates judgment prediction in a realistic scenario within the context of Indian judgments, utilizing a range of transformer-based models, including InLegalBERT, BERT, and XLNet, alongside LLMs such as Llama-2 and GPT-3.5 Turbo. In this realistic scenario, we simulate how judgments are predicted at the point when a case is presented for a decision in court, using only the information available at that time, such as the facts of the case, statutes, precedents, and arguments. This approach mimics real-world conditions, where decisions must be made without the benefit of hindsight, unlike retrospective analyses often found in previous studies. For transformer models, we experiment with hierarchical transformers and the summarization of judgment facts to optimize input for these models. Our experiments with LLMs reveal that GPT-3.5 Turbo excels in realistic scenarios, demonstrating robust performance in judgment prediction. Furthermore, incorporating additional legal information, such as statutes and precedents, significantly improves the outcome of the prediction task. The LLMs also provide explanations for their predictions. To evaluate the quality of these predictions and explanations, we introduce two human evaluation metrics: Clarity and Linking. Our findings from both automatic and human evaluations indicate that, despite advancements in LLMs, they are yet to achieve expert-level performance in judgment prediction and explanation tasks.